Estimating and Clustering Curves in the Presence of Heteroscedastic Errors

نویسنده

  • Nicoleta Serban
چکیده

The technique introduced in this paper is a means for estimating and discovering underlying patterns for a large number of curves observed with heteroscedastic errors. Therefore, both the mean and the variance functions of each curve are assumed unknown and varying over time. The method consists of a series of steps. We transform using an orthonormal basis of functions in L2. In the transform domain, the nonparametric regression is reduced to a means model. To estimate the means in the transform domain, we consider the class of linear or modulation estimators and proceed as in Beran and Dümbgen (1998) by minimizing the Stein’s unbiased risk estimate. By minimizing the risk over a nested subset selection of modulators, we reduce the dimensionality of the means space. We show that in the transform space, the risk estimate is asymptotically optimal in the Pinsker’s minimax sense over Sobolev ellipsoids under heteroscedastic errors. Coefficient estimation and dimensionality reduction via optimal risk estimation is essential for accurate clustering membership estimation. We illustrate our technique by estimating and clustering a large number of curves both within a synthetic example and within a specific application. In this application, we analyze the research and development expenditure of a subset of companies in the Compustat Global database. We show that our method compares favorably to two alternative approaches.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Clustering Curves in the Presence of Heteroscedastic Errors

The clustering technique introduced in this paper is a means for discovering underlying patterns among a large number of curves. One novel characteristic compared to the current clustering methods is that we allow for heteroscedastic errors. Both the mean and the variance functions of each curve are assumed unknown and varying over time. The clustering method consists of a series of steps: tran...

متن کامل

On Presentation a new Estimator for Estimating of Population Mean in the Presence of Measurement error and non-Response

Introduction According to the classic sampling theory, errors that are mainly considered in the estimations are sampling errors.  However, most non-sampling errors are more effective than sampling errors in properties of estimators. This has been confirmed by researchers over the past two decades, especially in relation to non-response errors that are one of the most fundamental non-immolation...

متن کامل

A robust wavelet based profile monitoring and change point detection using S-estimator and clustering

Some quality characteristics are well defined when treated as response variables and are related to some independent variables. This relationship is called a profile. Parametric models, such as linear models, may be used to model profiles. However, in practical applications due to the complexity of many processes it is not usually possible to model a process using parametric models.In these cas...

متن کامل

Wavelet designs for estimating nonparametric curves with heteroscedastic error 3

In this paper, we discuss the problem of constructing designs in order to maximize the accuracy 9 of nonparametric curve estimation in the possible presence of heteroscedastic errors. Our approach is to exploit the 3exibility of wavelet approximations to approximate the unknown response 11 curve by its wavelet expansion thereby eliminating the mathematical di5culty associated with the unknown s...

متن کامل

A New Method for Duplicate Detection Using Hierarchical Clustering of Records

Accuracy and validity of data are prerequisites of appropriate operations of any software system. Always there is possibility of occurring errors in data due to human and system faults. One of these errors is existence of duplicate records in data sources. Duplicate records refer to the same real world entity. There must be one of them in a data source, but for some reasons like aggregation of ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006